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Abstract 



lonides et al. [12, 13] have recently introduced an original approach to perform maximum likelihood 
parameter estimation in state-space models which only requires being able to simulate the latent Markov 
model according to its prior distribution. Their methodology relies on an approximation of the score vector 
for general statistical models based upon an artificial posterior distribution and bypasses the calculation of 
any derivative. Building upon this insightful work, we provide here a simple "derivative-free" estimator of the 
(^ • observed information matrix based upon this very artificial posterior distribution. However for state-space 

CO ' models where sequential Monte Carlo computation is required, these estimators have too high a variance and 

need to be modified. In this specific context, we derive new derivative-free estimators of the score vector and 
observed information matrix which are computed using sequential Monte Carlo approximations of smoothed 
additive functionals associated with a modified version of the original state-space model. 

Keywords: Maximum likelihood. Score vector, Observed information matrix. Sequential Monte Carlo, 
Smoothing, State-space models. 



1 Introduction 



\^ ■ Consider a random variable Y taking values in a measurable space y. Given e R'', we assume that Y follows 

^yf. \ a probability density function py (y; 0) w.r.t. a a-finite dominating measure denoted v [dy). Given y = y, the 

\^ • likelihood is denoted by C{9) = py {y',d) and the log-likelihood by i{9) = log py {y;0). Assuming that £{6) is 

t^^ I twice differentiable, we are interested in calculating the score vector l'-^^ {9) and the observed information matrix 

^j ' —l^'^\9) whose r*^ component tr and (r, s)'^ component —t.r,s{9) are given for r, s = 1, . . ., d by 



The score vector and observed information matrix are useful both algorithmically and statistically. Algo- 
rithniically, they can be used to build efficient maximum likelihood estimation techniques as in [12, 13] or to 
build efficient MCMC proposals relying on the local geometry of the target distribution [10[. Statistically, the 
observed information matrix can be used to estimate the variance of the maximum likelihood estimate. 

Exact calculations of the score vector and observed information matrix are only possible for models where 
t{9) can be evaluated exactly. For complex latent variable models, these quantities are typically computed 
using Monte Carlo approximations of the Fisher and Louis identities [4, 17]. However there are many important 
scenarios where this is not even a viable option. For example for numerous state-space models arising in 
applied science, we are only able to obtain sample paths from the latent Markov process but we have access 
to the expression of neither its transition kernel nor its derivatives [12, 13]. This prohibits the numerical 
implementation of the Fisher and Louis identities. It is thus useful to develop a simple method to obtain 
estimates of the score vector and observed information matrix which, beyond the specification of the statistical 
model, requires a minimum amount of input from the user but can outperform finite difference approximations. 

For the score vector, such a method has been recently proposed in [12, 13]. The main idea of the authors is 
to introduce an artificial random parameter with prior centred around 9. They establish that the expectation 



oi O — 9 w.r.t. the posterior associated to this prior and the hkehhood C{6) has components approximately 
proportional to the components of l^^'{d); the approximation improving as the artificial prior shrinks around 
6. In a state-space context where sequential Monte Carlo approximations are required, the direct application of 
this idea provides a high variance estimator. The authors propose a lower variance estimator which is computed 
using the optimal filter associated to a modified version of the original state-space model where an artificial 
random walk dynamics initialized at the parameter 6 is introduced. 

In this paper, our contributions are two-fold. First, in Section 2, we extend the idea in [12, 13] for approxi- 
mating the score vector to the approximation of the observed information matrix. We show that this latter is 
directly related to the covariance of the artificial posterior associated to 8 when the prior is carefully selected. 
Additionally we sharpen the theoretical results provided in [13] and use these results to compare the proposed 
estimators to finite difference estimators in terms of optimal rates of convergence of the mean squared error. 
These results hold for general statistical models. Second, in the specific context of state-space models, we pro- 
pose in Section 3 original estimators of the score vector and observed information matrix. These are computed 
using the optimal smoother associated to a modified state-space model which enjoys nicer theoretical properties 
than the one considered in ]j>>]. This allows us to obtain quantitative bounds for the sequential Monte Carlo 
implementations of the estimators. 

All proofs are postponed to the appendix. 

2 Derivative-free estimates of the score vector and observed informa- 
tion matrix 

2.1 An artificial Bayesian model 

We follow here the approach initiated in [13] and introduce a stochastically perturbed version of the original 
model corresponding to a pair of random variables (6, Y) having a joint probability density on M'' x 3^ 

p^y (e, y; e, r) = t-^'k [t'^ (o - d) } py (y; o) (2) 

where t > is a scale parameter and k (•) a probability density on W^. 

Our main result in this section relates the expectation of an arbitrary function h{Q — 6) w.r.t. the posterior 
Peiy(^| y',0,T) of O given Y = y defined through equation (2) to the entries of the score vector and observed 

information matrix given in (1). We will denote by Kg-^.^^r the expectation w.r.t. this artificial posterior. To 
present this result, we need to introduce some additional notation. If / : E^ ^- M is enough times differentiable, 
its A:"^-order differential at 6' is a fc-linear application from M'*^'^ to M denoted f'^'^\9) and we write 

where m, denotes the i-th component of u. For any vector u G W^ and matrix v € R'^'^'' we denote by |u| and 
\v\ the Li-norm: |u| = X]i=i I""*! ^"^^ 1"^! ^ Si=i X) i=i \'^ij\ ■ ^'^'^ ^ vector-valued function / = (/i, ..., /„) , we 
write J / (w) du for the vector f J /i (u) du, ..., J /„ (u) du) . 
Our results rely on the following assumptions. 

Assumption 1 k is a symmetric probability density on M^ w.r.t. Lebesgue measure. For any k > 1, 1 < 

ii, ■ ■ ■ T^k l£ d and /3i, . . . , /3fe > 1 there exists C (zi, . . . , i^, /3i, . . . , /3fe) < oo such that 



Pi 02 0k 

%% ■ ■ ■% 



K{u)du < C(ii,. ..,ife,^i,...,/3fc) 



and the covariance matrix S — {(Ji.j)^ -^^ associated to k is non-singular so af :~ cfi^i > 0. 
Assumption 2 k is such that 



Assumption 3 The log-likelihood function £ : W^ -^ M. is four times continuously differentiable and, for 6 
defined as in Assumption 2, the associated likelihood C : W^ — > M satisfies: 

VeeM'^ 30 <r]<S 3e,D>0 Vu £ M'^ £(6* + u) < De'^l''l\ 

Assumptions 2 and 3 ensures that if the hkehhood function is not bounded then it goes to infinity slowly 
enough so that the prior k "compensates". These assumptions are not restrictive given k can be selected by the 
user; see however the comment after Assumption 4. The following result then holds. 



Theorem 1 Suppose Assumptions 1-2-3. For any 9 G M.^ , and h ■.M.'^ —> M™ satisfying 

\h{u)\<c\u\" 
for some constants a > and c > 0, we have 

^0,T { h{Q -9) Y ^y\= I h{Tu)K{u)du + t h{Tu) 6^\9).u K{u)du 

+ ^ / Urn) - 1 M™)«(.)<i4 {^-'H^)...- + meyuf) «(«).. 



(4) 



T" I h{Tu) 



2 

_0/ 4+a- 






,®3 



K{u)du 



h{Tu)(.'^^\9).UK{u)du 



All our results are presented as asymptotic expansions for each 9; we could have also presented them as 
uniform upper bounds on the remainder term, for any 9 G K in some compact set K as in [13]. 

2.2 Approximation of the score vector and observed information matrix 

We detail here two useful consequences of Theorem 1. The first result is a strengthened version of the main 



Y = y\ are approximately 



result provided in [13] which shows that the rescaled components of Eg t- {Q — 9 

proportional to the score vector. This is established by applying Theorem 1 to the function h (u) ~ u. 

Theorem 2 Suppose Assumptions 1-2-3. For any 9 G W^ , there exist jy > and C < oo such that for all 
< r < 77 

\e^^\9) - T'^E-^Eg^r (Q-9 Y^y)\< Ct^. (5) 



Whereas the upper bound on the r.h.s of (5) provided in [1^5] is of order t, Theorem 1 shows that it is 
actually of order r^ . This sharper bound is crucial when comparing theoretically this estimator of the score to 
finite differences (see Section 2.4). It is additionally possible to approximate the observed information matrix by 
rescaling the elements of posterior covariance Cov^ ^{Q\Y = y). However, this requires an additional assumption 
on the artificial prior k that is for example verified when k is a multivariate normal with diagonal covariance 
matrix. 

Assumption 4 k satisfies niu) = J^j^j^ Hi{ui) and is mesokurtic, that is 

A, := / u^K{u)du ~ 3(Tj . 



Note that choosing a multivariate normal distribution for k in order to satisfy Assumption 4 makes the 
constraints on the likelihood brought by Assumption 3 more explicit. In this context, we obtain the following 
result by applying Theorem 1 to ft, (u) = uu^ . 

Theorem 3 Suppose Assumptions 1-2-3-4. Por any 9 G W^ , there exist rj > and C < 00 such that for all 
< T < ri 



-£(2)(6i)+r--*S-i|Cou,^ (e 



Y = y] -t'^J: 



}.-. 



<Ct^. 



2.3 Latent variable models and Monte Carlo estimates 

The approximations of i^^\6) and £'^^'(6') presented in Theorems 2 and 3 will be primarily useful for latent 
variable models where we have random variables {X, Y) taking values in a measurable space X xy and following 
a probability density function 

Px,Y {x, y; 9) = px {x; 9) py\x {y\x; 9) (6) 

w.r.t. a cr-finite product dominating measure denoted A [dx) v (dy) that is parameterised hy 9 E W^. Here X is 
a latent variable and, given Y ^ y, we have £{9) = log py {y, 9) where py {y, 9) ~ f px,Y {x, y; 9) dx. 

In this context, the artificial Bayesian model corresponds to a triplet of random variables IQ,X,Y] having 

a joint probability density on M'' x A" x 3^ 

Pe,x^Y (^' 2;, y; 0, r) = t'^k {^"' (s' - ^) } Px.v {x, y; 9) . 

Theorems 2 and 3 still obviously hold and l^'^\9) and i^^^{9) can be estimated by performing a Monte Carlo 
approximation of the posterior pg xiy'(^' ^1^' ^' ''")' hence of its marginal pgiy (0|y; 9, r), and then estimating the 
associated posterior mean Eg^r{Q I ^ = y) ^nd covariance CoVg ^{Q \Y ^ y). 

2.4 Comparison with finite diff"erence schemes 

An alternative "derivative- free" approach to compute i'--^^ (9) and £^'^'> (9) consists of using finite difference schemes 
combined to Monte Carlo estimates of i{9); see for example [_']. For sake of simplicity, consider the case where 
9€R. 

The central finite difference estimator of ^(^^(6') and second central finite difference estimator of i^^'>{9) are 
given by 

ai) fn^ _ ^N{9 + h)~lN{9~h) ^2) _ lM{9 + h)-2iM{9)+^N{9-h) 

where £n{9 + h), £^{9 — h) and £n{9) are independent Monte Carlo estimates using N samples. In most 
applications, these three estimators have both a bias and a variance of order N~^. It can then be shown under 
mild additional regularity assumptions that the optimal rates of convergence of the mean squared error for 
£^j^\{9) and £^^\{9) are 

E{£^^\{9)-£^'\9)y ^ iV-2/3 and E {£^^\i9) - £^^^9)}^ ^ N"^'^ 

for h ^ N^^/^ and h ^ N^^^^; see [2], Chapter 7, Section I for details. Results provided in Appendix B indicate 
that the optimal rates of convergence of the mean squared error of the estimators 

I^\{9)^T'^^-\pxA0)-0}, ?j^|,(0) = r-^S-i{VrW-T's}S-i (8) 

with 'J1n.t{9) and vm.t{9) being importance sampling estimators of Ee^rl©!^ = y) and £,0Vg ^{Q\Y = y) are 
similar to the finite difference optimal rates, provided r ~^ N~^^^ and r ~ A^^^/^ respectively. 

As pointed out in [2], these results are "rather academic" as the constants in front of the optimals h or t 
depend on unknown parameters. Moreover, as observed experimentally in [13] in the context of state-space 
models, £)^\^ {9) can outperform significantly tfq y^{9) as £x iii^) involves running two independent sequential 
Monte Carlo filters providing a high variance estimate of the numerator of (7) whereas variance reduction 
techniques positively correlating £Ni9 + h) and £Ni9 — h) proposed in [15] are not applicable in this model- free 
context. 

3 Estimation of the score vector and observed information matrix for 
state-space models 

3.1 State-space models 

Let {A't,yt}jgpj be a stochastic process such that {Xt,Yt) takes values in a measurable space X xy. For any 
sequence {zfe} , let Zi^j denote {zi, Zi^i, ..., Zj). The model is specified as follows: {X^l^gp^ is a latent Markov 



process of initial density i>{x;6) and homogeneous Markov transition density f{x\x';6) w.r.t. a dominating 
measure A (da;) whereas the observations {^ij^gf^ are assumed to be conditionaUy independent given {Xt}j_pj 
of conditional density g {yt\ Xt] 0) w.r.t. a dominating measure v (dy); that is Xi ^ fi (•; 9) and for i = 1, 2, . . . . 

Xt+i\iXt^x)^f{-\xt-i;e), Yt\{Xt=x)^gi-\xt;0). (9) 

It follows that the joint density of (Xi:t,^1:t) is given by 

T T 

Pxi,T,Yi:T {xi:T,yi:T;0) = iy{xi;9)Y[f{xt\xt-i;0)Y[g{yt\xt;6). (10) 

t=2 t=l 

For a realization Yi-t = yi-.r of the observations, the log-likelihood of function satisfies 

= log pYi:T {yi-.r; 0) ,where py^^T [yi-.T] ^) = / PXi:t.Yi:t (xi-.t, yi-.T] 0) A {dxi-.r) ■ (11) 



This model is just a specific latent variable model as discussed in Section 2.3 with X = Xi-^t and Y ~ Yi-t- 
Hence it is possible to approximate i''^\6) and i^'^^O) using Theorem 2 and Theorem 3 and by computing a 
Monte Carlo approximation of pg jj .y {6,xi;T\yi:T',0,T). Sequential Monte Carlo methods are the tools of 
choice to approximate posterior distributions for state-space models. Unfortunately, it is well-documented that 
standard sequential Monte Carlo methods would provide very high variance estimators of psiy {9\yi;T',d,T) 

in this context as the parameter 9 is static, see [4, 8]. Recently particle Markov chain Monte Carlo [1] and 
Sequential Monte Carlo squared [•'')] algorithms have been developed to address such problems and could be 
used to sample from pgiy {9\yi-T',0,T). We follow here an alternative approach initiated in [13], leading to a 
natural extension of the approximations described in Section 2.2 and their Monte Carlo estimates described in 
Section 2.3. 

3.2 An artificial Bayesian model 

We first extend the model by allowing the parameter 9 to change at each time point. We therefore introduce 
9i:T G ^"^"^ and an extended model such that Xi-t, Yi-t have a joint density defined by 

T T 

Pxi.t,?i.t(^1:T, yi:T\di:T) = V (xi; 6*1) J]^ / {xt\xt-i;9t) W_g{yt\ xt;9t) (12) 

t=2 t=l 

and we denote the associated log-likelihood of the observations Yi-t = yi--T by 

^(^1:t) = log Py^.^ {yi:T]9l,T) 

We write 9^^^ the vector of R''-^ made of T copies of 9 concatenated in a column vector. Similarly to Section 
2.1, we now introduce artificial random variables Oi:t with prior H centred around 9i-t G M**^. Given a prior 
K on W^ as in Section 2.1, satisfying Assumptions 1 and 2, we define k as 

T 

7J(^1:T; ^1:T, t) - n ^"""^ {^"' (^~* - ^') } (13) 

t=l 

and denote by T,t its associated block diagonal covariance matrix. The joint probability density of {Qi.t, Xi-t, Yi-t) 
is then defined as 

and we denote by Eg^ .^ ,- the expectation with respect to the associated posterior pg ly {9i-,T\yi-.T, 9i-T-, t). 

In order to apply the results of Section 2.2, we further assume that the log- likelihood (.{9i-,t) associated to 
this extended state-space model satisfies the following assumption, similar to Assumption 3. 

Assumption 5 The log-likelihood function I : W^'^ — > R is four times continuously diffcrentiable and, for 5 
defined as in Assumption 2, the associated likelihood C : R''"^ — >■ R satisfies: 

V6ieR'' 30 <T]<5 3e,D>0 Vui:t e R''^ £(6i['^l + ui:t) < i:'e'^^=i l"'l\ 



This assumption in conjunction with Assumptions 1 and 2 aUows us to obtain the equivalent of Theorem 1 

T 



for Ee[T]^^{/i(ei:T - 0^^^)\Yi.,T = yi-.r} with k{ui..t) = Ylniut) and £(*)(6l[^l) in place of k and i'^'^O). It is not 
stated here for the sake of brevity. 



t=i 



3.3 Approximation of the score vector and observed information matrix 

It is now possible to adapt the results of Section 2.2 to obtain new estimates of the score and observed information 
matrix for state-space models. 

Theorem 4 For the artificial Bayesian model introduced in Section 3.2, satisfying Assumptions 1-2-5, for any 
9 e W^ there exist r] > and C < oo such that for all < t < rj 



£^'\9) ~ Sr,T (0) 



< Ct^ 



(15) 



where the score estimator is given by 



Sr,Tie)=r-^^-'lY.^eiTir{^ 



. t=l 



Yi, 



TO 



(16) 



Remark 5 In [13], the score estimate is obtained by considering a different stochastically perturbed state-space 
model where Qq ~ t~'^k {t~^ (• — Ot)} , 



Qt - Ot = Qt-i - 6t-i + crVu Vt"^^- n{-) 
and it is established that, for any compact K , there exist rj > and C < oo such that 



sup 1^(1) (0)- T-^\Eem,jQT 



Yv.T = yi-.Tj - 



'}! 



<C T+^ 



(17) 



(18) 



Note that the estimate in (18) requires solving a filtering problem whereas our estimate in (15) requires solving 
a smoothing problem. 

As in Section 2.2 we complete this section with an approximation of the observed information matrix. To 
obtain simple approximations we assume that k satisfies Assumption 4. 

Theorem 6 For the artificial Bayesian model introduced in Section 3.2, satisfying Assumptions 1-2-4-5, for 
any 9 G W^ , there exist r] > and C < oo such that for all < t < rj 



-£^'\9) - Ir^T (0) 



<Ct'^ 



where the observed information matrix estimator is given by 

( T T 



Ir.T (0) 



^^-MEE' 



'OVniT] 



,r(0.s,e 



Yi..j 



yi:T 



T^TT. \ T.- 



(19) 



(20) 



3.4 Sequential Monte Carlo estimates 

The approximations of the score vector and observed information matrix given in Sections 3.3 require comput- 
ing Eg[T] .r(0t|^i:T — Hi-.t) and Covg[T] T-(6s, Qt\Yi:T = Vi-.t) for s, t S (1, 2, ..., T) . These can be approximated 
using sequential Monte Carlo methods applied to the modified state-space model described in (14) with 9i.t = 
9'-'^'. This provides us with an approximation of ps x \y {0i-T,xi:T\yi:T',0,T), hence of its marginals 



Pe,\Y^..A^t\yi:T] 



^[^1 



and 



Pes,et\Yi.. 



s,0t\yi:T',9'-'^' ,t). However, this approximation will be progressively 



impoverished as T increases because of the successive resampling steps. Eventually, pa .y {9t\yi:T', 9^ ' , r) will 
be approximated by a single unique particle for T — t sufficiently large. To obtain lower variance estimators, it 



is possible to use sequential Monte Carlo smoothing techniques. Standard approaches include the forward back- 
ward smoothing procedure [7] and the generalized two- filter smoothing recursion ['>]. However these approaches 
are only applicable when we can evaluate f {x'\x;9) pointwise and the primary motivation for this work is to 
address scenarios where this is not possible. In this case, we can only use the bootstrap filter [11] and can still 
obtain lower variance estimators at the cost of a small bias increase by using the fact that, when the state-space 
model enjoys forgetting properties, we have 



E«m . e, 



( ©t Yl:T = yi-.TJ ~ ^em^T ( ©t ^l:(t+A)AT = yi:(t+A)AT] 



for a lag A large enough. This fixed-lag approximation was first proposed in [14] and has been studied in [16]. 
Similarly, w.l.o.g. consider that for t > s we have 



Zovg[T] ^ Os,Qt 



Yi..j 



yi:T 



Cov. 



em. 



T-(6s,©t yi:(t+A)AT — yi:(t+A)AT 



and for t — s > A 



Cov, 



'm,T y 



es,et 



Yv.T = yv.T 



0. 



This suggests to practically approximate using the bootstrap filter the following fixed-lag smoothing approxi- 
mations of the score vector St.t (0) and the observed information matrix Ir^r (^): 



U=i 



^em,T ( ©t yi:(t+A)AT = yi:(t+A)AT ) ^ T6 > , 



(21) 



T (s-l-A)AT 
s=l t=s+l 

with the convention that X]fc=i = if i > j. 



Y, 



l:(t+A)AT — yi:(t+A)AT 



^l:(t+A)AT — 2/l:(t+A)AT 



T'^TT. y E' 



(22) 



3.5 Convergence results 

We first quantify below the bias brought by the fixed-lag approximation. Our bounds rely on the following 
mixing assumption. 



Assumption 6 (a) The set S (9) 



>0 



is compact, so d{9,T) :— sup 

ees(e) 



< oo. 



/ (x'\x;9j < oo so p (9) = 



9eR'^ -.K^h-e) /t\ 

(b) A {dx) is a probability measure. 

(c) a{9) = infe_,„g5(e)^;^^;^ f {^x'\x-Pj > 0, a{e) = snVg^^^^,^^^^^^^^^ 
1-0,(9) /a (9) >0. 

(d) for all yey,g {y; 9) = swpe,^^s{e)xx 9 [v\ x; ^) < oo, g{y; 9) ^ J g (^y\ x; 9^ t-'^k. {r-^ (^- ^) } d9X (dx) 
0, gi (y; 9) = J g (^y\ x; ^) t-'^k {r-i (9 - 9^)} d9 ly [x; 9) X (dx) > 0. 

Weaker conditions could be used at the cost of substantially more complex proofs; see [(>, chapter 4[, [18]. 
Proposition 7 Suppose Assumption 6. Then for all integers T>l,0<A<r— 1, we have 



> 



r' |S {5.,A,T {9) - Sr,T {9)}\ < 2d (9, r) p {9f (T - 1 - A) , 
T^ |S {IrA.T (9) - Ir,T (9)} S| < 2d {9, rf p {9f (T - 1 - A) 



3 + 6A 



2p{9) 



l-p{9) 



(23) 

(24) 



Let S^^ rp (9) and I^^ j. [9) be the bootstrap filter approximations of Sr,T {9) and It,a,t (9) based on N 
particles. The following proposition relies partly on results in [16]. 



Proposition 8 Suppose Assumption 6. Then for all integers T>l,0<A<T— l,N>l and for any p > 2, 
there exist constants B and Bp, dependent only on p, such that 



T'-\E[j:{s?^^,j.{e)-Sr,T{0)}]\< 



T^ |E [E {/^^^j, {9) - Ir,A.T (0)} S] I < 



N 



diO,T) 

N 



Y.Ct{o) 



(25) 



T T (s+A)AT 

Y,{zCt{6) + DUe)] + 2Y, E {Cs{e) + 2Ct{6) + Dl{e)DUe)] 

(26) 



and 



,2]E1/P 



r^Fl/P 



'\^{S?.A,Tie)-Sr.A,Ti0)}\'' 
S{/.^A,TW-/r,A,T(0)}Sr 



^die^Y^DUo) 



< 



N 



diO.T) 



N 



T T (s+A)AT 

Y,2,DUe) + 2Y^ E {D?(0)+2i?f(0)} 

t=l S = l t = iS+l 



(27) 
(28) 



where the expectations are with respect to the law of the bootstrap filter and 



CtiO) 



DliO) 



B 



a{ef{i-p{e)f 



(t+A)AT_ 2 _ I a^2 



_B„ 



fe=2 5(yfc;^) 

(t+A)AT 



a{B){\-p(d)\ ^ g{yk;0) ^ i-. .n^ 



k=2 3. ^ 






4 Discussion 



Building upon [12, 13], we have proposed a derivative-free estimator of the observed information matrix for 
general statistical models which can be computed easily using Bayesian computational tools. In the specific 
context of state-space models, we have also obtained new derivative-free estimators of the score vector and 
observed information matrix. These estimators are obtained by solving smoothing problems for a modified state- 
space model that differs from the one proposed in [12, 13]. Under mixing assumptions on the original state-space 
model, it is possible to obtain quantitative bounds for the resulting sequential Monte Carlo estimators. 

Extensive numerical experiments comparing in practical situations the various estimators discussed in the 
paper will be made available shortly. 
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A Proofs 

In the proofs we will use the notation 



C.u 



Sik 



Cx 



l<ii....,ifc<d 



for any C G R and we will denote by B(x^ r) the ball of radius r > centered at x € M'' for the Li norm. 
Our calculations use extensively the elementary fact that, for k a symmetric prior, for odd k and for any 
1 < ii, ■•-,«&< rf 



Mij . . . Mij, n[u)du = 0. 



(29) 



The proof of Theorem 1 uses techniques borrowed from the hterature on Bayesian asymptotic theory, see for 
instance the first chapter of [')] and references therein. Indeed both estimators provided by Theorems 2 and 3 
are based on the asymptotic moments of the posterior distribution as it concentrates. However in our context, 
the posterior concentrates because the prior concentrates whereas the hkehhood function is fixed. In Bayesian 
asymptotic theory, the posterior concentrates because the hkehhood function concentrates and the prior is fixed. 
The proof of the main Theorem 1 rehes on the foUowing Proposition. 

Proposition 9 Suppose Assumptions 1-2-3. For any G W^ , and /i : M'' — )- M™ satisfying (4) for some 
constants a > and c > we have for any r > 

h{Tu) — — K{u)du ~ / h{Tu)K{u)du-\- T I h{Tu)v '{9).u K{u)du 

C-yO) J J 



+ t2 [ h{Tu) \\(.^^\e).u®^ + \{l^^\e).uf\ K{u)d' 



+ T^ f h{Tu) i^{e^^'>{0).uf + he^^\e).u){i^^\o).u'^^) + ^^i^'^He).u'^^\ K{u)du + o(r4+"). 

Proof. We start by dividing the integral into two parts 

^(™) — TTa\ — K[u)du^ h[Tu) — - — K[u)du+ h{Tu) — - — K{u)du 

l-K^I Jt\u\<p '-V') Jt\u\>p i-V') 

for p in IR.+ . We now fix p. 

The expansion in Proposition 9 stems from the first part, while the second part will end up in the ©(t^"*"") 
remainder. We look at these two terms separately. 

First let us rewrite the first part of the integral as 

h{Tu) -— — - — K{u)du — I h{Tu) cxp {£{9 + Tu) ~ i{9)} K{u)du. 

t\u\<p ^{") Jt\u\<p 

We then use multiple Taylor expansions for a fixed value of tu. First, we have 

2 3 

(.[9 + Tu) = £{9) + t£^^H9).u + — £(2)(5i),„®2 ^ Z_gi3)(^Qy^m ^ R^^Q^ru) 
where R^{9,tu) simply denotes the remainder. Then using 

„2 ,^3 ^4 /■! 



X X X I ,^ ■\i„xt. 



e^ = l + .+ - + ^ + ^y^(l-t)V*d., 



we obtain 



e'(o+ru)-i{0) ^ ^ ^ r£^^)(^e).u + l^£<^^\9).u'^^ + 1^-^(3) (0).u®3 + R,{9,tu) (30) 



+ i |r£(i)(0).w + ^^(2)(0).^,«2 ^ l!^(3)(^)^^«3 ^ R^^o,ru) 

+ +1 |r^(i)(0).u + —l(^\9).u^^ + I^£(3){9).u'^^ + R^[9, tu)\ [\i - t)3eW''+^")-^W>*di. 
3' I 2 3! ) Jo 

We then integrate this expression multiplied by h{Tu)K{u) over u on the set {u : t\u\ < p}, and group the terms 
as follows 

/l(Tu)e^(^+''")-^('')K(M)du = / h{Tu)K{u)du + T f h{Tu)£^^\9).U K{u)du 



h{Tu)\]-£^^\9).u'^'^ + \i(^^^\0).uf\ K{u)d 



r\u\<p 



Hru) {^{i^^HS)-u)'^ + l{£^^\9).u){l^^'>{9).u'^^) + ^r £^^^^9) .u®^\ K{u)du + Q{9,t) 
u\<p 13! 2 3! J 



where Q{9,t) is the sum of all the remainmg terms that were m (30). We would like to prove that Q{9,t) 
©(t^^"). Let us look at the various terms in this remainder. 



Kr3^+02+33 f h{Tu)U'^^He).uy' (£'-^\e).u^^y' (e<^^\e).u®A'' K{u)du (3i) 



1. First, some terms of this sum are of the form: 

' T\u\<p 

for some K > and non-negative integers ji, j2, J3 such that ji + J2 + js > 4. Using (4) and Assumption 
1 guaranteeing that k has finite moments of all orders, we bound the integral in (31) by Lr" for some 
constant L independent of t and hence terms like in (31) are ©(r-'i+J^+is+a'j _ 0(7-^+"). 

2. Then, some terms are similar to (31) but also include powers of i?3(0, tu) under the integral. We use the 
Lagrange form of the remainder i?3(0, tu): 

for some 9* G B{9, tu). Since the log-likelihood £ has a continuous fourth derivative (Assumption 3) there 
is a constant m{9, p) > independent of r such that for every 9* € B{9, p) and every u € W^: 

£^^\9*).u'^^ < \m{9,p).u'^^\. (32) 

R3{9,Tu)\< 



Hence, we have 

T^\m{9,p).u^^ 



4! 
Therefore the terms that include powers of R^{9,tu) are also 0(r"'+"). 

3. Finally, some terms of Q{9,t) (the ones that come from the last line of (30)) also include the following 
expression under the integral over u 

/"\l - t)3eW«+-")-^W}*dt. 
Jo 

Since the log-likelihood £ is continuous (Assumption 3) there exists a constant C{9, p) > independent of 

T such that 

sup \£i9*)-£{9)\<C{9,p). (33) 

e*G-B(e,p) 

Hence we obtain for any u such that t\u\ < p 

\l - t)3eW'+^")-^W>*dt < [\l - t)3e^(«-'')*dt < 0(9, p) 

/Q "'0 

for some C{9, p) eM. and this integral does not cause additional difficulties. 
Hence all terms in Q{9,t) are ©(r^^"). At this point we have 

/i(ru)e''(''+''")"^('^)K(u)du = f h{Tu)K{u)du + t f h{Tu)£^^\9).u K{u)du 



h{Tu) l-£^^'>{9).u'^^ + -(£^'^HO).u)^\ K{u)d 



r\u\<p 

+ T^ f h{Tu) i^{£^^\9).u)^ + ^{£^^\9).u){£^^\9).u^^) + ^£(^n^)-^®4 '«(")'^" + 0{t^+''). 
Jt\u\<p 13! 2 3! J 

However we would like the integrals on the right-hand side of the equality sign to be over the whole space (as 
in the statement of Proposition 9) instead of being restricted to {u : t\u\ < p}. Hence we want to add the 
following terms on the right-hand side 

h{Tu)f^''^{9).u®''K{u)du for /c^ 0,1,2,3 

t\u\>p 

10 



where f°^{9).u'^° ^ 1, /[il(6').w®i = i^^^9).u, /[2l(0).u®2 ^ l£i^)(0),u«>2 + i(£(i)(6l).u)2 etc. We do this by 
proving that these integrals are 0(t*+"). 

For r small enough such that r < p/M with M as in Assumption 2, we can write 



h{Tu)f^''\e).U®'' K{u)du 



t\u\>p 



< 



t\u\>p 



j[fc](gl)_y»fe 



e-^l"l'du 



and we can bound |/W(e').w®'=| by |/[''1(^)| l^l'' where 



/['=l(6l) := sup /['=l(6l).ii®'= <oo 
|«|=i 



to obtain 



f h{Tu)f^^\e).U®'' K{u)dt 

J t\u\>p 



r|ll|>p 

We will conclude by proving for any tti € R, /.i S 



< c 



/W(^) 



, |fe+a e-7l«l^rfy. 



|u|>p 



«l>p 



|ur"e-^l"l'du = 0(r^). 



(34) 



Let us prove (34). We have 

1/2 



\u\2<\u\<d''M2 

where |u|2 = I X]i=i ■"? ) i'' the Euclidean norm so 



|u|>p 

First we change to spherical coordinates 



l^r e-^l"l'du < d'"/2 / |u|^ e-^l"l'du 



t|u|2>p 



t\u\2>p \JTr>p / 

where r represents the radius, and S'd_i(l) is the surface of the d-dimensional unit ball of radius 1 associated 
to the Euclidean norm. We now handle a simpler integral on the one-dimensional variable r: 



^m+d-l g-7r ^^ 



(with s = r^) 



(with t = ST^) 



Tr>p 
1 



>(^r 



-—{'ni-\-d) 



'ds 



m + d-S ^ + l^& 



typ" 



s s _— (ni+d) s s r 

(multiplying and dividing by e"^'' /^ ) = e"'^'' /^ / (i + p*) 

" Jt>o 

(as long as r < 1 so that Vt > - 7t/T* < -jt ) < e"'^'' /^ / (t + p"^) 

" ii>0 



S\ !- 



-^*/^'df 






(for all /i e M) = ©(r^). 

This allows to conclude /^|„|> h{Tu) f^^"^ {9) .u®^ K{u)du = ©(r^^") for any /c = 0, 1,2,3 and we finally obtain 
the desired expansion 



I /i(ru)e^(''+^")-^('')K(w)du = / h{Tu)K{u)du + t f h{Tu)(.^^^ {9).u n{u)dh 

+ r2 [ h{Tu) (i£(2)(0)_y®2 ^ i(£(l)(6»).^t)2l ^(^)rf^ 



+ r^ /" /i(™) |l(£(i)(6»).w)3 + i(£(i)(6»).w)(£(2)(6»).u82) ^ i.£(3)(5/).y®3 j ^(^)^^ _^ 0(r''+"). 
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The second part of the integral of interest 



C{9 + Tu) 

t\u\>p '-\^) 

ends up inside an ©(r'*"''") term through the following reasoning (note that we do not use Taylor expansions for 
this part, since it is the part where tu is large). Using Assumptions 2 and 3 we have (with M as in Assumption 
2): 



/ 



u\>p 



n(Tu) — — — - — K[u)au 



c{e) 



< 



CT" 



P 



^{(^) Jt\u\>p 

cD 



(provided that t < — and with v ~ tu) < , . , 



u\°'C{0 + Tu)K{u)du 



(35) 



(switching to Euclidean norm) < 
(changing to spherical coordinates) — 



rC{0) 



i^^^\v\^2-'yH2r~' dv 



^Q+d-lg£d'"'V"-7rV-''^^_ 



r>p 



Then we take advantage of the assumption i] < 6 to bound erf'" ^r'' — ^r^T~^ by —ar^T~^ for some a > 0, again 
for T small enough. Indeed consider the expression ed^^'^r^ — jr^r^ for e,ri,j,S > with ry < S, on the set 
{r > p}. Then 

where we bounded r^~^ by p^~^ using 5 — r] > 0. Then we take r small enough so that ed^^'^r^ ~jp^~^ < —a < 
for some a > 0. With such an a, we have ed^/'^r^ — ^r^T~^ < —ar^r^^. 
We end up with the following integral 



^a+d-lg-arV ' ^^_ 



r>p 



We then use the same reasoning as in the end of the previous section (see (34)) to conclude that this integral is 

©(r^) for any power p. ■ 

Proof, of Theorem 1. We have the identity 



E 



g^r { /i (e - e) I r = y} 



jh(e-9)pY (y; e) T-^K {{o-e) /r} 
^PY(v■,0)T-d^,^{e-e)/T'^de 

J h (tu) exp {£ (9 + tu) ~ £{9)} k (u) du 
/ exp {£ {9 + tu) - e{9)} n (u) du 



d9 



(36) 



where we have used Bayes formula in the first line and a substitution u = t^^{9 — 9) in the second line. The 
numerator corresponds to the expansion obtained in Proposition 9, while the denominator is a particular case 
of the numerator when h is the constant function h : u i~^ 1. In this case, Proposition 9 yields with a = 0: 

exp{^ {9 + tu) - e{9)} K{u)du = 1 + + t^ [ l-l^'^\9).u®'^ + -{l'^^\9).u)A K{u)du + + 0{t^) 



where the zeros come from using (29). Hence 



exp {i {9 + tu) - £{9)} K{u)du 



1-r 



2 f h{Tu) |i£(2)(g)_y®2 ^ i(£(l)(0).y)2 j ^(y)rfu + 0{t^) 
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and finally 



h{Tu)n{u)du + T / h{Tu)t- '{6).u K{u)du 

+t'' f h{Tu) lU^^^'>{e).u^^ + l-{£^^\e).uf] H{u)dv 



.2 ' ' 2 



3! 



3! 



Putting the terms of order C'(r'*+") together we obtain Theorem 1. ■ 

Proof, of Theorem 2. This is a direct consequence from Theorem 1. For h{u) — u, we have a = 1, and using 

(29) 



Ee.r e-0 



Y = y 



u6^\e).uK{u)du 



r^ I U |l(£(l)(0).u)3 + i(£(l)(0).^i)(£(2)(0).y®2) ^ }_^{3)^QY^m\ ^(y)^y 



T 

y 



€(2)(6»).w®2 _^ (£(l)(6)).w)2| K(w)dl 



u^(^'(6').wK(w)du 



<C't 



'^5 



Under assumptions 1 and 2, the integrals appearing in the t** terms are upper bounded so there exist rj and 
C" < oo such that for < t < 77, 



0.r(e-O Y = y)\-T^ ue^^\9).u K{u)du 



< C"t\ 



We can now conclude by noticing that J uu^ k (u) du = S where S is defined in Assumption 1. ■ 
The proof of Theorem 3 relies on the following Proposition. 

Proposition 10 Suppose Assumptions 1-2-3-4- For any G W'- , there exist 77 > and C < 00 such that for all 
< T < rj 



I [e - e') (e - ey Y ^ y\ - t'^ - T^Ao (^t^) (0) + gn) ^gyd) (0) t^ 



Ee,r e 



<Ct'' 



(37) 



where o is the Hadamard product (i.e. element-wise product) and 

( <^f <^M ■ ■ ■ ^Wd \ 

v2^2 



CToCri (To 



ycr^cr^ ••• '^d"d-l 



"d-l"d 



(38) 



Proof. The result is established by using Theorem 1 for h {u) = uu and a = 2. Given the matrix norm we 
use, Theorem 1 still holds and yields 



Efl.r I (e - e) (e - e) 



Y = y} -t"^ I uu^K{u)du - t^ uu^ 6^\e).u K{u)du 






T / T 

uu — uu ■ 



uu^ |1(£(1)(0).^)3 + l{e^^\9).u)ie^^\0).U^^) + l^(3)(0).u®3| Kiu)du 



^(2)(6l).u®2 _^ (£(l)(6»).^t)2| ^(y)^^ 



uu'^^(i)(6l).UK(it)dw 



(39) 
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We have J uu^ K{u)du ~ E and elementwise using (29) 
UiUj V- '{9).u K{u)du = 0, 

U,Uj |l(^(l)(6l).u)3 + -{l''^\e).u){l^'^\e).U®'^) + l£(3)(6»).M®3 j ^(y)^y ^ 0, 



/ U^UJ I -. 



Ee,r [Q-e] [Q-e 



y = y\ -T^^ / (mm - s 



{^(2)(g))_y®2 _^ (£(l)(5»).y)2| ^(y)^^ 



<Ct^ 



Now we look elementwise at the integral on the left hand side of the above equation. The element i,j of the 
term is equal to 



y I {u,Uj - a,,,) {^(2)(^).y«2 ^ (^(i)(0).,4)2| ^(„)^^ 



d d 



tEE^m (^)(A.,,m-^..^m) 



fc=i 1=1 



where we use the notation 



4yV) 



d'^i di d£ 



and Aij^k,i ~ J UiUjUkUi K{u)du. Because of Assumption 4, the element i,i is equal to 

Y E E<;'^(^) (Am,m - -?-m) = ^^f'io) (A, - .^) = r^.t(^i^(e) 

k=l 1=1 



since Ai,i^k,k = o'fo'fc for i ^ k. The element i, j, for i 7^ j is equal to 



fc=i i=i 



fc=i /=i 



as Ai^i,k.i = <^i<^k.i when i, j, fc are distinct. The result of the proposition follows. 
Proof, of Theorem 3. Proposition 10 yields 



{£(2)(0)+£(1)(0)£(1)(0)T|_^-4 



Bo 



Q-e] e 



Y^y}-T^j: 



<C't 



■'^2 



(40) 



where B G M''^'^ is the matrix such that Bij ~ A^^ with A given by (38). We also have 





i^ 


\B)l'^^\df ~r 


'^Y. 


-'Ee,,(e 


-e 


r 


= 


y)^e 


r{ 




Y 








= 


{£(i)(^)-^"'s-iie,, (e-e 


y = j/)}£(i)(0)T 




+T-2E-i]Ee,^(e-6i 


y = y) {^(^'(e) - T-^s-ifie,^ (e - 61 


y-y)Y 






< 


{£(i)(0)-r-2i]-iE0,, (e-e 


y = y)}£W(0)^ 






+ 


r-^E-ifie,, (e-e 


y = 2;){£(i)(0)-T-2i]-iEe,,(e-e 


Y = y) ■ 






< 


^(i)(0)-r-2E-iEe,, (e-0 


Y^v) 


( 


6^\e) 


+ r-^ 


s-^Ee,, (e- 


9 Y - 


-y) 


Using Theorem 2 yields 










£(1 


(0) - T-H 


]-ii 


^e.r 


(< 


3-e 


Y 


-'] 


<c 


1" T^ 
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for some constant C" , and using the triangle inequality 



T-^S- 



^Ee,^(e-6i r = 2/) < (.^^\e) + 6^\e)~T-'^i:-^¥.g^r{^Q-o r = y) < 6^\e) 



C" T^ 



which leads to 



l^^\6)i^^\6f - r-^E-iEe^, (o - 



y = y Ee,, e- 



y = 2y S 



<C" 



(41) 



for some constant C" when r is small enough. Combining (40) and (41) we obtain 



£(2)(0) 



Now using Assumption 4, 



Bo 



Ee,r e-0 e 



y = y^-T^S 



S^^Eg,^ [Q-e 



Y^v]^g,r[e-e 



Y^y] S-i 



< Ct^ 



(42) 



Bo 



Ee.r [Q'O] e 



e 



1 {Coz;,,^ ( 



Y = y] -T^T. 



}.-. 



y = y ^ -T^s 



-S" 



r e- 



y = y Ee^, e-0 



y-y) s^i 



Indeed for the i-th diagonal term we obtain 



(a^) ' JEfl,, ( (e, - 0,y y = y V rVf I - (af) ' {e^,, (g, - 0,;| y = 2;) } 



K)"'{v«,.(e 






Y = yj-T' 
while the (i,j) off-diagonal terms become 

(dfdl)"' Ee,, { (e, - e^ (e, - e,^ \y = y}- {'^f'^^V' ^e,r (a, 

Y^y] 



y = y Ee,, e 



Y = y 



{(jfcr^) Cov, 



e^Oj 



which concludes the proof. ■ 

Proof, of Theorem 4. It is straightforward to check that Theorem 2 holds for the artificial Bayesian model 
of Section 3.2 under assumptions 1, 2 and 5. Assumption 5 is necessary to ensure that developments as in (35) 
can be performed on the extended model. Hence for any 6 gM."^ there exist 77 > and C < 00 such that for all 
< r < ?7 

{'P (^[^1) - T-^t^%iTi, (ei:T - 0[^i| yi:T = yi-.r) \<Ct\ 

To prove the theorem we have to relate the dT-dimensional gradient J (6*^"^') to the desired d-dimensional 
£(i)(6'). We have l{e) = /(6'[^]) and the chain rules yields 



^W(0) = ^£T)(ri) 
t=i 

where ItiOi-r) denotes d£{9i;T)/d0t. We have 



}J 



< E l^t'Ho^^^) - r-2 {S^^E,!.,,, (ei.T - 9^ I Y,..T = 2/i:t) } 



t=i 



|ZW(0[T1)_ r-2E-iE,[.,,(ei:T-e[^l 



Yl..T=yi:T]\<C 
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Tij, , we have 



where {s}^ denotes the entries {d{t — 1) + 1, ..., dt} of a vector s e M'^^. Now because of the block structure of 
ve 

T ( T 



Yl:T = 2/1 :T 



and the result follows. ■ 

Proof, of Theorem 6. Using the assumptions, Theorem 3 holds for the extended model. Hence for any G 

there exist 77 > and C < 00 such that for all < r < 77 



|/'^(0[^l)-r-4S^i{Cot<,[.,,,(ei:, 



Yi-.T = yi:T ] -T H 



^}^t'\ 



< C t\ 



To prove the theorem we have to relate the dT x cfT-dimensional Hessian i (fP^^ ) to the desired d x d-dimensional 
Hessian i^"^^ {&) . The chain rule yields 

where v^ \(Q\;t) denotes d'^i{9i-T) / 06 sdOt- Hence we get for some 77 > 0, C < 00 and any < r < ?/ 

T T T T 



|EE^m(^'^') - -^'T.T.[^t' {^OV.^^lr (ei:T 



Yi;T = yi:T ] -T j: 



JE- 



t=l s=l 
T T 

t=l s=l 


t=l s = l 


^\r\9i^ 


1)- r-^E^i |coz;,[T 



^ |C0Ug[T]^^ f Ol:T yi:T = yi:TJ ~ t'^^t\ ^t^ 



where [M]^ ^ denotes the (i, s) d-dimensional block of a matrix M E M.'^'^^'^'^ , and by using the L-'^-norm on 
matrices. Under our assumptions S^ is diagonal and each of its diagonal blocks is equal to E, therefore we 
obtain 



T T 



EE[^T'{coi'em,.(ei:T 



t=l s=l 



Yl:T = yV.T - T S 



J^t\ 



=I]-1 



T T 



EE [^o"em,r (01:T 



t=l s=l 
T T 



yi:T = 2/l:T 



r^TS 



=^" lEE'^o^e>->,-(®-Q 



>l:T=yi:T) -T^TS^S-l 



and the results follows. ■ 

Proof, of Proposition 7. From Assumption (6), the transition kernel of the latent Markov state (8t,Xtj 

given (Qt-i,Xt-i) = (6t,xt-i) satisfies 



■M n(det,dxt) 



where 



< r-^n ^ 



(d9t,dxt] 



dOt 



f (xt\xt-i;dt] X{dxt) < a{d) fi (ddt,dxt] 



r-'^ "-^ 



dOt \{dxt) 



This ensure that the forward and backward smoothing kernels \ Pg[T] ^ I d9t+i, dxt+i 
and -^ Pg[T]_7. ( d9t,dxt 



Qt = 9t,Xt = Xt,Yi.,T = 2/1: 



T-l 



■)};: 



Qt+i = 9t+i, Xt+i = Xt-\.i, Yi-T = yi:T ) ( are uniformly ergodic with mixing constant 
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p{9); see e.g. [G, chapter 4], [Ki. Theorem 3.1]. For the score, we thus easily obtam that 

T^\^{Sr,A.T{G)-Sr,Tm\ 



T-l-A 



- Z^ Pe[rl,r (0* ^1:T — yi-.TJ - ^em^r ( Qt ^l:(t+A)AT = yi:(t+A)AT) 
t=l 

<2d(6i,T) p(6')'^ (T-l-A) 

where the last inequality follows from the ergodicity of the backward smoothing kernels. This yields (23). We 
now write t(A) for (t + A) A T. For the covariance, we have 



< 2 



T T 

s=l t=l + (s+A)AT 
< ^CoWglT]^ f 6t,0t yi:T = yi:TJ - CoVglT].^ f 6t,0, 

U=i 

T s(A) 
2^ ^ C0We[Tl_^ (^O^.e* Yl-.T = yi:TJ - C0Vg[T]^^ (^6,s,6t ?i:t(A) = 2/l:t(A)j 



Fi 



l:t(A) — yi:t{A) 



s=l t=s+l 

For the first term we have 

T T 

s=l t=l+s{A) 
T-A-1 T 

- H Yl \^OVgm,r[Qs,Qt 

s=l t=l+s(A) 

where 

C0Vg[T].^(Qs,Qt yi:T = yi:T 



T-A-1 



yi:T = yi:T 

Yl-.T = yi:T 



s=l t=l + (s+A)AT 



yi:T = 2/1:1 



Eem,.{e,Eem,. (e^ 



X5,es,yi:T = J/1:T 



J/1 



TJ-Eem,. (e, 



yi:T = yi:T 



^em,r I Q* 



>'l:T = 2/1:3 



Egm,. [e, {e (e^l X„e„ Yi:t = yi-.r) - ^em^r (e^| Yi:t = yi-.r) } 



^l:T = yi:T 



< 



< 



'-em. 



-em,T 



Qs {JEem^, (et| X,, O^Fi^t = yi:T) - Egm,. (et| Yi.,t = 2/1:t) |} 



yi:T = yi:T 



e. 



>1:T = J/i:T 2d [9, t) p'-' < 2d (6, tY p {6) 



Nt-S 



where the last line of inequalities follows from the ergodicity of the forward smoothing kernels. Hence we have 

T T T-A-l T 

'Jg[T] ^ ( 6s,6t Yl-.T = yi:T 



2E E 

s=l t=l+s(A) 



s=l t=l+s(A) 



t-s 



1+A 



<Ad{0,TY 



m 

i-p{e) 



(T-A-l) 



(43) 
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We are now interested in upper bounding 

T 



^€.ovg[T\,^ [Qt.Qt Yi,T ^yi-.Tj -CowgiT] .^ (^6i,6t yi:(t+A)rxT = 2/i:(t+A)AT 



T (s+A)aT 



s=l t=s+l 


(e^e* 


T 

< > ; Covgm,,(QuQt 
t—i 


?i:T = yi 


T (s+A)AT ^ 


. (e„et 



We have for s < t 



Yl;T = yi:TJ - CoUg[Tl^^ (^6s,6t Yl:(t+A)AT == 2/l:(t+A)ATJ 



j - <C0Vq[t]^^ {Qt,Qt yi:(t+A)AT = yi:(t+A)ATJ 



^LiT = yiiTJ -C0Ug[T]_^ (9^,64 yi.(t+A)AT = yi:(t+A)AT j 



Cowgin ,. f 8s,6t Yi.,T = yi:TJ - Cowg[Ti_T- f 65,6^ 



^1 



l:t(A) — 2/l:t(A) 



< 



6.6/ 



n:T = yi:T) -E0[T],, (e,e;^ 



yi:t(A) = yi:t{A) 



6, 



-i,r(e 



^elTlriQ 



Yl-.T == yi:T 



E, 



em.r I 0* 



yi:T = VI-.t] -Eg[Tl,^ 6 



^l:t(A) — yi:t(A) 



So we have 

T 



A 



6. 



^1 



l:t(A) — 2/l:t(A) 



E« 



6, 



^l:t(A) — 2/l:t(A) 



^Coug[Ti^^ (^0t,6t yi:T == 2;i:tJ -Coi;0[T],^ (^6t,6t yi:t(A) = yi:t{A) j 



T s(A) 
s=l t=s+l 



^llT = 2/1:tJ - <C0Vg[T].^ [Qs, 6t Yl:t(A) = yi:t(A)j 



<M[e,Tf p[e)^ + i2d{e,Tf p{e)'^A. 



(44) 



The bound (24) foUows by adding (43) to (44). ■ 

Proof, of Proposition 8. The bounds on the score vector given in (27) and (25) foUow directly from 
Proposition Al in [Hi]. The bounds on the observed information matrix estimator are obtained as follows. We 
use Eg[T] 7. and Cowgirj ^ for the bootstrap filter approximations of expectation and covariance. Hence we have 

T^\E[i:{i^^^^r{d)'irA,T{0)}m 



Y, 



^ |E|C0We[r].^ (^64,64 yL:4(A) =J/l:t(A)j - CoVglT] .^ \^Qt, Qt 
t=l 

T s(A) 

2^ ^ \'E^Cove[T]^^y&s,Qt ?i:t(A) = yi:t(A)j -Cowe[T]_^ (^65,64 



l:t(A) — 2/l:t(A) 



Y 



l:t(A) = yi:t(A) 



)} 
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where, using ab — ab = a lb — b\ + (a — a) lb — b) + b {a — a), we have 



E- 



< EJiem,. (e^e^l Y,.,t^A) = yi:t(A)) - £em.. (e.e?| %.t(A) = 2/i:t(A)) } 

n,r[^s ?i:t(A) =yi:i(A)j E|Ee[r]^^ (^e^ ?i:t(A) =yi:t(A)j - ^em ^r [^I Yl:t{A) =yi:t(A)j| 
|Eg[T]^^ (©s Yl-.tiA) =2/l:t(A)j -Eg[T]^^ f 6s yL:t(A) =yi:t(A))j^9lT]^T i^I Yl:t{A) = 2/l:t(A)J 

IE JEem^^ i^Qs Yi,t{A} = yi:t{A)J - ^9m,r (©^ ?i:t(A) = yi:t{A)J I 
|lEem,r (Q?' ?i:i(A) = 2/i:t(A)j - Eem^^ (^8^ ?i:t(A) = 2/i:t(A)j I 

|Ee[Ti_^ (^e^e^ ?i:t(A) ==2/i:t(A)j -]Eem,T (©sB^ ?i:t(A) = 2;i:t(A)j || 

?i:t(A) =2/l:t(A)j -^9^ ,. [^Qt Yi,t(A) =yi:t(A)j|| 

yi:t(A) == yi:t{A)) - ^em.r (©s ?i:t(A) = 2/l:t(A) j | 

1/2 



< 



+ d(0,T)|E{E0[T],, ( 
+ (i(6l,r) e|e0[t],^( 



e. 



e. 



E 



X E 



|Ee[T]_T- y&s Yi-ti^A) = yi:t{A}j - ^em^T (© 



Fi. 



)}■ 



^l:t(A) = yi:t{A) j - Eem.r ( ©t 



l:t(A) = yi:t(A) 
Yl;t{A} = 2/l:t(A)j| 



1/2 



Hence we have using Proposition Al from [l(i] that 



t4|E[E{J,^^^j,(0)-/.,a,tW}S] 



<di0,TY 



T s(A) 



Y,{3Ct{e) + DUo)} + 2j2 E {Cs{e) + 2C,i9) + DUo)Dno)} 



t=l s=l t=s+l 

The bound (26) foUows. Similarly we have by Minkowski's inequality 



ryp[\i:{l^^A,T{0)-Ir,A,Tid)}^ 
T 

t=i 

T s(A) 

+2EEi^[ 



5a:t(A) = yi:t(A) ) - C0Vg[T]^^ ( Ot yL:t(A) = yi:t(A) 



1/p 



m ^ I 6s, 9t 



yi:t(A) = yi:t{A)] - C0Vg[T]^^ f 6s, 6t yi:t(A) = yi:t(A) 



1/p 



where, using a6 — a6 = a I 6 — & 1 + (a — a) 5, we have 



E 



< E 
+ E 
+ E 

< E 



CovgiT].^ f 6s,et 

EglTl 

E(e 



r I QsQj 



Yi..t(A) = yi:t(A)j " CoVglT].^ i^Qs, 6t 

T 



Yl: 



^l:t(A) = yi:t(A) 

^l:t(A) = 2/l:t(A) ] - ^0^ ^t { ©s©/ Yi-j-j^A) = yi:t(A) 
V 



1/p 



1/p 



(t+A)AT 



E«[Ti ^ e 



E, 



'em 



A 



e. 



Y 



l:t(A) — yi:t(A) 



^l:t(A) = yi:t(A) ) - ^0\T].r ( ©t ^l:t(A) — yi:t(A) 

P 



1/p 



^em^ri^s 



Y, 



l:t(A) — yi:t(A) 



E 



em 



E, 



(^OsOt^ yi:t(A) =2/l:t(A)j -]Ee[T],^ [©sB^ yL:t{A) =2;i:t(A)j 



em.T '^s'^t 



+ d {6, t) E Eg[T]^^ i^Qt yi:t(A) = yi:t(A) j - lEem,^ (^64 yi:t(A) = yi:t(A) j 



rf(6i,r)E 



Eem., 6 



i^l:t(A) = 2;i:t{A) ) - ^0[T].T ( ©s yi:t(A) = 2/l:t(A) 



1/p 



1/p 



1/p 



Y 



l:(t+A)AT 



1/p 
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Hence the bound (28) follows Proposition Al in [LG] x 



r%i/P 



S«A,TW-^r,A,T(0)}S|'' <d{9,T) 



T T s(A) 

J23DnO) + 2Y^ ^ {DU0) + 2D^{e)} 

t=l s=l t=s+l 



B Optimal convergence rates 

The estimators 



7i^) 



where JiN^rid) (resp. W7v.t(^)) is an importance sampling estimate of Eg ,-(0 — 0|y = y) (resp. Cov^ ^ (©1^ == 2/))i 
verify 



n' ' 



E{«jv,r(e)} = Coiv^ (e| y = y) + ^, V{^^,,(0)} = |,r^ 

for some a, 6, c, d independent of r^ and N. These conditions are for example verified asymptotically in N if 
^n,t{9) and VN^r{d) are importance sampling estimator using the artificial prior as importance distribution. 
Hence the mean squared error for i'-^^ (6) satisfies 



E 



< 






e 

7277 



4v^-2^ 



Y = y 



for some e, / > 0. This upper bound is minimised for t of order N ^'^ and is then of order A'^ ^'^. Similarly 
the mean squared error for £"' (Q) is 



E 



< 






y = y)-T2i]}E-i 



5 



r^iV 



/ir* 



for some g,h > 0. The upper bound is minimised when r is of order N^^'^ and is then of order N^^''^ . Hence 
we find these estimators have the same optimal rate of convergence in terms of the mean squared error as the 
finite difference estimators. 
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